PROGS.JAM[UP,DOC] - www.SailDart.org

perm filename PROGS.JAM[UP,DOC] blob sn#314034 filedate 1977-11-02 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00039 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00007 00002	STANFORD PROGRAMS                     1                           INTRODUCTION
C00009 00003	STANFORD PROGRAMS                     2                          ABOUT HEADERS
C00013 00004	STANFORD PROGRAMS                     3                          ABOUT HEADERS
C00015 00005	STANFORD PROGRAMS                     4                      ANALYSIS PROGRAMS
C00018 00006	ANALYSIS PROGRAMS                     5                                      S
C00021 00007	ANALYSIS PROGRAMS                     6                                      S
C00024 00008	ANALYSIS PROGRAMS                     7                                      S
C00027 00009	ANALYSIS PROGRAMS                     8                                      S
C00031 00010	ANALYSIS PROGRAMS                     9                                      S
C00034 00011	ANALYSIS PROGRAMS                     10                                     S
C00036 00012	ANALYSIS PROGRAMS                     11                                     S
C00037 00013	ANALYSIS PROGRAMS                     12                                 HANAL
C00040 00014	ANALYSIS PROGRAMS                     13                                 HANAL
C00041 00015	ANALYSIS PROGRAMS                     14                                PVCOMP
C00044 00016	ANALYSIS PROGRAMS                     15                                PVCOMP
C00048 00017	ANALYSIS PROGRAMS                     16                                 DFSYN
C00051 00018	ANALYSIS PROGRAMS                     17                                 DFSYN
C00054 00019	ANALYSIS PROGRAMS                     18                                FLTCMP
C00058 00020	ANALYSIS PROGRAMS                     19                                FLTCMP
C00060 00021	ANALYSIS PROGRAMS                     20                                 PITCH
C00063 00022	ANALYSIS PROGRAMS                     21                                 PITCH
C00064 00023	ANALYSIS PROGRAMS                     22                                   PDF
C00067 00024	PROCESSING PROGRAMS                   23                          INTRODUCTION
C00069 00025	PROCESSING PROGRAMS                   24                                   LPS
C00072 00026	PROCESSING PROGRAMS                   25                                   LPS
C00075 00027	PROCESSING PROGRAMS                   26                                 ENORM
C00077 00028	PROCESSING PROGRAMS                   27                                FLTAPP
C00080 00029	PROCESSING PROGRAMS                   28                                FLTAPP
C00083 00030	PROCESSING PROGRAMS                   29                                SRCONV
C00086 00031	PROCESSING PROGRAMS                   30                                HEADER
C00087 00032	DISPLAY PROGRAMS                      31                          INTRODUCTION
C00089 00033	DISPLAY PROGRAMS                      32                                 EXMRG
C00091 00034	DISPLAY PROGRAMS                      33                                 AFPIX
C00093 00035	EDITING PROGRAMS                      34                          INTRODUCTION
C00095 00036	EDITING PROGRAMS                      35                                 FUNED
C00098 00037	EDITING PROGRAMS                      36                                 FUNED
C00102 00038	EDITING PROGRAMS                      37                                 FUNED
C00106 00039	EDITING PROGRAMS                      38                                 FUNED
C00111 ENDMK
C⊗;
STANFORD PROGRAMS                     1                           INTRODUCTION


                           STANFORD PROGRAM LIBRARY
                  Or: How to Feel Like you can Do something
                           Without really having to


This  document describes,  the programs  (not subroutines)  from  the Stanford
Computer Music Group.  These include programs for editing and mixing  of sound
files, for doing several kinds of  analysis on sound files, and some  types of
synthesis also.
STANFORD PROGRAMS                     2                          ABOUT HEADERS


ABOUT HEADERS


Sound files either have  headers or they don't.  A header is a  128-word block
that tells several things about the file: what the sampling rate is, how is it
packed (12 bits to a sample,  18 bits to a sample, or 36-bit  floating point),
how many channels it  has (1 to 4  currently), what the maximum  amplitude is,
and can contain a text comment also.

Most programs call the  same subroutine to read  a sound file name  for input.
This subroutine looks at  the sound file and sees  whether it has a  header or
not. If  it does  have a  header, it returns  the header  data to  the calling
program and prints out the comment text. If it does not have a header, then it
interrogates the user directly. Usually it does so by asking  three questions:
what the sampling rate is, what  the packing is, and how many channels  is it.
The sampling  rate is usually  defaulted to  25,600 Hz if  you just  type C.R.
(carriage return), the packing to  12-bit integer, and the number  of channels
to monaural. Generally for the sampling rate, you can just type the  number of
kHz, and if it is less  than 100, it will automatically be multiplied  by 1000
to convert to Hz. For the packing, 0 usually stands for 12-bit, 1  for 18-bit,
and 3 for 36-bit floating point.

If your sound file does not have a  header on it and you wish it did,  you can
run a program called HEADER and put one on it.

There are at least two formats  for data, such as analysis or  synthesis data.
There  are merge  files (sometimes  abbreviated MRG  files) which  are (often)
large files of binary data. They consist of a number of  separate sampled-data
functions, each with its own  name, length, and possibly sampling  rate. There
are  programs  that  examine  merge  files  (EXMRG)  and  display  the various
functions, there  are programs that  can read and  edit the  various functions
(FUNED), and there are subroutines for dealing with merge files (MRGPAK).

There is also the SEG-type file. This is a text file that can be read into the
STANFORD PROGRAMS                     3                          ABOUT HEADERS


NEWMUS  compiler.  It  is used  for  piecewise-linear  functions.  There exist
programs for editing these (FUNED) as well as converting between this form and
MRG file form. Most of the heavyweight analysis programs produce MRG  files as
output. Most of the music synthesis programs take SEG-type functions as input.

All of the sources for these programs are on [MA,JAM] on the MUSIC0  UDP. They
use library routines which may be found in JAMLIB.REL[SUB,SYS], the sources of
which are on [LIB,JAM]  on said UDP. They also  use MRGPAK which is  a package
for binary data file manipulation the source of which is also on [MA,JAM].
STANFORD PROGRAMS                     4                      ANALYSIS PROGRAMS


                              ANALYSIS PROGRAMS


The analysis programs come in several varieties.

There is S, which is  a general purpose interactive analysis  program, capable
of viewing sound files and taking discrete Fourier transforms  of subsections.
This is good  when you are just  sort of curious about  what is going on  in a
particular sound file.

There is HANAL,  which is useful  for people working  with MUSIC V  or NEWMUS.
This is primarily designed for single tones from either orchestral instruments
or short vocal tones. The program computes the amplitude envelope of the tone,
then at the maximum amplitude, it takes the Fourier transform. Its output is a
SEG type function that can be read directly into NEWMUS as a function. This is
useful for trying to simulate a given tone in NEWMUS.

There is PVCOMP,  which is not  the least bit  interactive. It does  the phase
vocoder analysis,  which is  a time-variant  discrete Fourier  transform. This
program  is  coupled with  DFSYN,  which takes  this  data and  turns  it into
magnitude-frequency form. This  latter form is  the most useful  for intuition
and for synthesis. DFSYN can also resynthesize the tone either at the original
pitch or at some other pitch.

There  is FLTCMP,  which computes  the linear  prediction coefficients  for an
entire sound  file. It  does so  by taking  a fixed  window and  stepping this
window through  time by equal  increments. The method  used is  Burg's maximum
entropy method. This joins with FLTAPP, for applying the filter  thus computed
to a sound file.

PITCH computes the pitch of a sound file by autocorrelation. It  steps through
the sound file and  gets the pitch at each  point in time. You must  bound the
search by supplying the range in which the pitch will most likely lie.
ANALYSIS PROGRAMS                     5                                      S


S - AN INTERACTIVE BROWSING PROGRAM


Commands  to S  are  single character  commands  with control,  meta,  or both
depressed.  We will  abbreviate below control by  the character α, meta  by β,
and control-meta by αβ.  Most  commands then ask various questions  about what
is desired. The first thing you must  do is specify a file name. This  is done
by αI.  It will ask you for a file name.

All commands that take  values, like αB or  αE below, will accept  <RETURN> as
meaning "dont change the value after all" and will leave the number unchanged.

αI Set an input file.
αβI Close the input file.
αO Open output file
αC Copy input file to output file (closes output file)


This will copy the input file  from the time specified by the begin  time (see
αB) below to  the end time  (see αE) below.  Unless you specify  otherwise, it
will always  put a header  on the output  file (use the  Xtend command  -OH to
prevent it).

αB Set begin time
αE Set end time


On setting times in S: Times  are given in seconds. In addition, times  can be
given in samples by preceeding the decimal sample number with the  letter "S".
For instance, the  time 1.0 and  S25600 correspond to  the same sample  if the
sampling rate is 25.6  KHz. The time value  is not converted to  sample number
until the last  instant, so if you  change sampling rates after  setting begin
and end times, these times will still be valid.
ANALYSIS PROGRAMS                     6                                      S


One can also type ∞, which stands for the sample number of the last  sample in
the file.

αS Show waveform between begin time and end time
βS Advance times by 75 percent of their difference and show
αβS Move back above amount and show
λ( Move right 2↑(λ/2) units and Show
λ) Move left 2↑(λ/2) units and Show
λ/ Make window smaller by 2↑(λ/2)
λ\ Make window bigger by 2↑(λ/2)


This business is a  way to specify a number  along with the command.  That is,
the "λ" above is a number that is the binary number specified by  the control,
meta, and top keys. α is 1, β is 2, αβ is three, top-α is 4, top-β is  5, top-
αβ is  6.  Thus if  you type  "[" instead of  "(", or "]"  instead of  ")", it
includes effectively three more into the total.

All the commands that display the  sound file produce a file on the  disk that
contains the picture itself. The file  name is always SIG.PLT You can  look at
this file by using the αD  command below - that command displays a  plot file.
You can rename the plot file to something else using the βN command below.

αF Take DFT of sound within window (between begin and end times)
βF Take Cepstrum of sound within window
αβF Take Autocorrelation of sound within window


Various window  functions can  be applied while  doing these  transforms.  The
window function  options can be  changed with the  Xtend commands  SWINDOW and
TWINDOW.

All  these transforming  routines  produce a  plot  file called  FFT.PLT  - in
addition, they  produce a  "pseudo-sound" file called  FFT.PS. The  purpose of
ANALYSIS PROGRAMS                     7                                      S


this creation is to allow you to read it in, just like a sound file,  with αI,
then to look at it more closely with α/ or αS as you will. The horizontal axis
for the FFT is in KHz (not Hz), so the beginning or ending "times" you specify
for αS are actually in this case in terms of KHz.

αL Filter a file


You get your choice of  lo-pass, hi-pass, band-pass, and band-stop  filters or
either Butterworth, or Chebychev  characteristics.  Dont forget that  you must
set the begin and end times of the input file so the filtering will  take some
non-zero portion of the input file.

Since filtering a file does somewhat unpredictable things to the amplitude, it
is usually the  best idea to  set the output  packing mode to  36-bit floating
point to avoid amplitude over or underflow. You can later convert the floating
point to 12-bit integer using  the copy command, αC. It will  also renormalize
the file to exactly -1 to +1 in amplitude.

Note that one very handy use of the filtering routine is to remove  50-Hz hum.
To do this, you use  a bandstop filter from maybe 40  to 60 Hz, or even  a hi-
pass filter from 60 Hz on up.

αP Apply optimum-comb algorithm
αD Display a plot file
βD Display it again
αM Computer the maximum value of a sound file
αN Rename a file
βN Rename the signal plot file SIG.PLT
αβN Rename the FFT file FFT.PLT
αR Reverberate a sound file
αH or α? HELP. Types out list of commands
ANALYSIS PROGRAMS                     8                                      S


αX is the "extended" command. It answers "yes?  " and expects you to type a 1-
to-6 letter command. the commands are as follows:

ICLOCK Set input clock rate
OCLOCK Set output clock rate
CLOCK Set both clock rates
INCHAN Set number of channels on input file
ONCHAN Set number of channels on output file
NCHAN Set number of channels on both files
AUTOIN Turn on automatic file name incrementing mode
IPACK Set input packing mode
OPACK Set output packing mode
PACK Set both packing modes
OHEADE Make output file have a header
TEXT Set output header text
PBOOL Print out values of all boolean variables
CHANNE Set current channel <not used>
DMODE Set display mode (average, sample, direct, envelope)


Display mode applies to  all time function displays (rather  than transforms).
Normal, average, and sample  refer to what to do  if there are more  points in
the window  (between the begin  and end  times) than there  are on  the screen
(1024 maximum).   AVERAGE says display  the average of  the points  around the
point in question. This  is the default mode  and is usually the  right thing,
except that for very large windows, it will give the appearence of attenuating
the signal. SAMPLE  will just resample  the signal at  the proper rate  to get
1024 points on the screen.  This preserves the original amplitude  spread, but
sometimes leaves out important points, especially if the sound being displayed
is full of  spikes and impulses.  DIRECT  mode displays all the  points anyway
with no data reduction at all. This usually gets display errors  for exceeding
the maximum buffer size. ENVELOPE takes the maximum over each group  of points
(the number  of points in  a group  is specified by  the WINDOW  Xtend command
below. The  default is 750  points I  think).  This is  really only  useful in
ANALYSIS PROGRAMS                     9                                      S


conjunction with  ABMODE, with says  to take the  absolute value of  the sound
file first.  The combination  of ABMODE  and ENVELOPE  display mode  gives the
amplitude envelope of the sound waveform.

ABMODE Make function non-negative before displaying
PMODE Makes display use endpoint vectors
WINDOW Sets averaging window width
DPCHAN Sets which channel of multi-channel input file to display
NOXAXI Turns off display of x-axes on all displays
NOYAXI Turns off display of y-axes on all displays
WFLAG Specifies how FFT window relates to begin time & end time
SWINDOW Set windowing function to use on sound wave (for FFT)
TWINDOW Set windowing function to use on transform (AUTOC & CEP)
LOGF Makes FFT display of LOG of xfm
SELECT Select input file
DPSCALE Set display scaling
SLAM Slam bottom of display to zero
CUTOFF Set reverberator output cutoff
REVTIM Set reverberation output file length


Some of these things specify values and others are just true or false (boolean
variables). The Booleans are  AUTOIN, OHEAD, ABMODE, PMODE,  NOXAXIS, NOYAXIS,
LOGF. You can print out the values of all the boolean variables with the PBOOL
command.

All of these Xtended commands can be abbreviated to the smallest unique set of
letters. For instance, ICLOCK and OCLOCK may be abbreviated al IC and OC.

Since reverberation is more complicated, we have reserved its explaination for
down here. The αR command will either filter an input file with a reverberator
of your choise, or will  just produce the impulse response of  a reverberator.
Eventually, it will come down  to specifying the details of  the reverberator.
You  usually do  so in  a specification  file.  Such  a file  will  accept the
following commands:
ANALYSIS PROGRAMS                     10                                     S


Conventions are:
"i" stands for an integer, as does "j"
"x" is a floating-point number
all commands are terminated by CRLF
Commands are:

NCOMBS=i Sets Number of parallel comb filters to i
CMBLEN[i]=j Sets delay of comb i to j samples
CMBG[i]=x Sets gain of comb i to x
N1ALPS=i Sets number of series 1-pole allpasses to i
ALP1L[i]=j Sets delay of 1-pole allpass i to j samples
ALP1G[i]=x Sets gain of 1-pole allpass i to x
N2ALPS=i Sets number of 2-pole allpassses to i
ALP2L[i]=j Sets delay of 2-pole allpass i to j samples
ALp2F[i]=x Sets oscillation frequency of 2p allpass i to x
ALp2G[i]=x Sets decay of 2p allpass i to x
DIRECT=x Sets fraction direct sound to x
PRINT Print out the values of all the parameters so far
EXIT Go on to computation
HELP Print this message
ABORT Get out of REV gracefully, back to command loop


For instance, this is a typical reverberation file and can be used directly to
set up an alpass reverberator for you:

NCOMBS=0
N1ALPS=5
N2ALPS=0
ALP1L[1]=1597
ALP1G[1]=.750
ALP1L[2]=1117
ALP1G[2]=.720
ALP1L[3]=787
ALP1G[3]=.692
ANALYSIS PROGRAMS                     11                                     S


ALP1L[4]=509
ALP1G[4]=.671
ALP1L[5]=331
ALP1G[5]=.651
DIRECT=.85
EXIT
ANALYSIS PROGRAMS                     12                                 HANAL


          HANAL - QUICKIE ANALYSIS FOR ENVELOPE, SPECTRUM, AND PITCH


This program is for people using MUSCMP or MUSIC V who just have a  sound file
of a single  note and want  to capture the amplitude  envelope of the  note as
well as the heights of the harmonics at some single point in the file. This is
a very simple-minded program.

This program requires the use of graphics, so you must be on a PDP-11 graphics
terminal to use it.

R HANAL
Input SND file: <type name of sound file containing single note>
Output MRG file: <make up a MRG file name here>
Function name in MRG file: <can use same name as file name>
Estimated fundamental frequency: <your best guess. Not critical>
<here it puts up a picture of the envelope>
<after viewing the picture, type C.R. to continue>
do envelope approximation? <if interested in envelope, type Y>
<shows you pictures of envelope and approximation>
FFT will be taken at .1013 <type C.R. to accept this time>
Is this OK? <this is where it will take FFT>
<puts up picture. Type Y to accept time>


After this, it is all automatic. The procedure is this. After you tell it your
best guess at the fundamental frequency, it puts up a picture of the amplitude
envelope. After you  get tired of  staring at this  picture, you type  C.R. It
will then start doing piecewise-linear approximations to the envelope. It will
do 20 approximations, putting up  a picture after each one. After  it finishes
with that, it will type out the time of the maximum point of the  envelope. At
this point, you should probably type C.R. It will then put up a picture of the
actual waveform centered around that point. If you think that is a  good place
to take an FFT (nice and regular) then type "Y" and it will go  on. Otherwise,
ANALYSIS PROGRAMS                     13                                 HANAL


type C.R.,  and it  will ask  you for the  time to  take the  FFT. You  can go
through this as many times as you like. Once is usually enough for most folks.
After this, the program  exits. There will be  a file <name>.TXT on  your area
that contains  the piecewise-linear  approximations, the  harmonic amplitudes,
and the pitch of the signal.  The name of the file will be the same as that of
the MRG file. There will also be a MRG file with the entire envelope in it.
ANALYSIS PROGRAMS                     14                                PVCOMP


                       PVCOMP - PHASE VOCODER ANALYSIS


The "Phase Vocoder" is  somewhat of a misnomer.  It refers to a  technique for
doing time-varying Fourier-like analysis on continuous sound.  We can think of
the process as applying  a bank of bandpass  filters to the input  signal. The
channel  filters  are  all  identical. They  are  each  formed  from  a single
prototype filter that is shifted to equally spaced points across  the sampling
rate  by heterodyning.  Each filter  has a  complex impulse  response,  so the
output of  each channel  is in  phase quadrature.  Further information  can be
obtained by reading  "Realization of the  Phase Vocoder" by  Michael Portnoff,
IEEE Trans.  on Audio, Speech, and Signal Processing, June, 1976, and  "Use of
the  Phase Vocoder  in  Computer Music  Applications" by  James  Moorer, Audio
Engineering Society preprint, October, 1976.

Anyway, this  program, combined  with DFSYN, and  other display  programs, can
give time-variant pictures of the spectra of sound files. Each channel  can be
viewed separately or together in perspective.

You use the program as follows:

R PVCOMP:
Input SND file: <type input file name here>
Reading from file xxxxx
Output PV file: <type output file name here>
Writing on file xxxxx
<types out file header information here>
Fundamental (or lowest) frequency: <see note below>
Number of windows/2 in average: <see note below>


After this, the program will compute for a long time (about 1 minute for every
second of sound) and will eventually exit. At this time, you will have an
output PV file, called whatever you typed for the answer to the second
ANALYSIS PROGRAMS                     15                                PVCOMP


question. Probably the next thing you want to do is use DFSYN to turn this
into a magnitude-frequency form.

If the sound file is only one pitch (like an isolated tone from an
instrument), then you should find the pitch with S (use the DFT feature) and
type in exactly the fundamental frequency here. On tones with vibrato, you can
type roughly the center frequency. In these cases, channels will correspond to
harmonics.

If the sound file has highly variable pitch, like in normal speech, then you
must type the lowest pitch the sound achieves for this question. This will
assure that each channel with capture no more than one harmonic. There may
thus be channels with no signal present, but this is OK.

Number of windows in the average. This determines how sharp the filters will
be. A large number here (like 4, 6, or 20) will make very sharp filters, but
they will extend over a great period of time. I have been favoring small
numbers lately, like for instance, 4 or so seem to work well.

The output of this program is a merge file (sometimes abbreviated as a MRG
file). Presumably, the format of a merge file is described in an appendix
somewhere.

The number of channels that will be used will be one half of the sampling rate
divided by the fundamental (or lowest) frequency (rounded to the nearest
integer) plus one. For instance, with a sampling rate of 25,600 Hz, and a
minimum pitch of 64 Hz, there will be 201 channels, spaced exactly 64 Hz
apart, from 0 to 12,800 Hz.  The length of each output function is determined
by the length of the input file and the number of channels. The length of each
function will be the length of the sound file, in seconds, times the
fundamental (or lowest) frequency, times two.

In the MRG file, the functions will be named "REAL.n" and "IMAG.n" where n is
the channel number from 0 to the maximum.
ANALYSIS PROGRAMS                     16                                 DFSYN


             DFSYN - PHASE VOCODER MAGNITUDE-FREQUENCY CONVERSION


When PVCOMP has finished producing a file of analysis data that represents the
outputs of a number of  bandpass filters that are equally spaced  in frequency
and converts this to the amplitude  and frequency form. If what is  coming out
of a  given bandpass filter  is a  pure sinusoid (like  a harmonic),  then the
frequency will be the frequency of that sinusoid and the amplitude will be the
amplitude of the sinusoid. If what is coming through a given channel is  not a
pure sinusoid (like is two or more sinusoids together, or is pure noise), then
the magnitude and frequency functions will not appear to make any sense.

This program can do either of two things - it can do the conversion  and write
the amplitudes and frequencies  out on another merge  file, or it can  use the
amplitudes and frequencies to synthesize a tone, possibly multiplying  all the
frequencies  by  a  constant  factor, then  synthesizing  a  signal  from this
information. This  works well for  simple sounds, but  for (for  instance) low
pitched resonant male voices, it doesn't work so well.

The program is operated like this:

R DFSYN
Input PV file: <type PV file name here>
Output MF file: <type output file name here, or C.R. if not desired>
Output sound file: <type output sound file name, or C.R.>
Output compression ratio: <see below>
Intermediate compression ratio: <see below>
Maximum channel number: <see below - C.R. works here>
Make new merge file? <always answer "yes" here>
Frequency Multiplier: <ratio or C.R. for 1.0>


At this point,  the program will execute  for a long, long  time, ocaisionally
burping out  bits of text  to tell you  where it is.   There are  some general
ANALYSIS PROGRAMS                     17                                 DFSYN


rules for  deciding how  to set  the various  compression ratios.  Ideally, we
would set both the  output compression ratio and the  intermediate compression
ratio  to 1.  What this  means  is that  each output  function  (amplitude and
frequency) would have  as many points as  the original sound file.  This gives
the best fidelity, but takes up a great great deal of disk space.  Since it is
not practical to store this entirely  on the disk, we must compress it  a bit.
If you give an output compression  ratio of 16, then there will be  one output
point (per  channel) for  each 16 input  points. The  intermediate compression
ratio says how many points will be used internally in the program. If  you are
doing synthesis, this will work best if  it is set to 1, but other  values can
sometimes give  reasonable results with  much less computer  time. If  you are
just  preparing output  for  a merge  file, there  is  no reason  to  make the
intermediate compression  ratio anything  other than  identical to  the output
compression ratio.

Another annoying fact is that these compression ratios must  divide integrally
into the number of channels (minus one), and that the intermediate compression
ratio must divide integrally into the output compression ratio.

When you are done with this program, you can produce a series of PLT  files of
the amplitudes and frequencies by using AFPIX. (or even EXMRG).  The functions
in the merge file are named XXXX.An or XXXX.Fn where the XXXX is the file name
of the input PV file and n is the channel number.

In response to the "maximum number of channels" question, you probably want to
say C.R., which automatically sets it to as many as possible.  You can  set it
to less than this if you don't think you need the upper channels.
ANALYSIS PROGRAMS                     18                                FLTCMP


                 FLTCMP - LINEAR PREDICTOR FILTER COMPUTATION


This program is part of a package for doing work with linear predictive coding
that consists of the programs FLTCMP for computing the predictor coefficients,
FLTAPP to  apply the filter  to a sound  file, PITCH to  track the pitch  of a
sound file,  PDF to  make the voiced-unvoiced  decision and  the silence-sound
decision  for  a  sound  file,  LPS  to  synthesize  from  the  pitch, various
decisions, and filter coefficients, and  ENORM to normalize the energy  of the
resulting synthesis to correspond to the original signal (or anything else).

To use FLTCMP, you do as follows:
R FLTCMP
Input SND file: <type sound file name>
Output K file: <type desired output file name>
Ms between covariance calculations: <type number, like 5, here>
Width of covariance window in Ms: <type number here, like 25>
Order of filter: <see below>
Autocorrelation method? <I favor Y these days>


Now  the program  will compute  for a  long time  and then  stop. It  types an
asterisk after every 100 steps.  As to what numbers you should put  where, the
time between calculations should be as short as possible  without compromising
efficiency. A useful number is 5 milliseconds. You just type 5 to the program.
You might try 2 or 1 if you really want faithful tracking, or perhaps 10 or 15
if the exact quality is not  so important. As for the width of  the covariance
window, for periodic sounds, it  should be at least two periods  wide (without
knowing any more -  there are all sorts  of subtle considerations here  that I
shall ignore for the time being). For high-pitched sounds, however, this leads
to inordinately short windows. The window should be at least 50  percent wider
than the step size  (ms between calculations).  The  order of the filter  is a
complicated thing. If you want it  to exactly mimic the sound, pitch  and all,
you should  have at least  2 orders  for every harmonic.  This will  produce a
ANALYSIS PROGRAMS                     19                                FLTCMP


filter  that  will  contain  both  spectral  information  as  well   as  pitch
information. If you just want it to track the spectral information and not the
pitch information, you  should set the order  to somewhat less than  that. For
low pitched voices, I have  used 35 with reasonable success. For  high voices,
16 seems to be more reasonable.

It normally does the Burg (harmonic mean) method of linear prediction,  but it
also does the standard autocorrelation method, which is somewhat  quicker. The
results are very similar, but  often the autocorrelation is a bit  smoother on
the whole.
ANALYSIS PROGRAMS                     20                                 PITCH


                     PITCH - COMPUTE PITCH OF SOUND FILE


This program computes the pitch of a sound file using straight autocorrelation
with a smattering  of statistical decision theory  thrown in. It  does various
error correction  and smoothings on  the pitch contour,  but still  manages to
make errors every now and then.  It produces both a MRG file and a  text file.
The text file contains a SEG-type description of the pitch contour, but scaled
to a  maximum of 1.0.  Since the  maximum pitch detected  is printed  out, the
original pitch contour may be recovered by multiplying by this maximum.

It is used like this:
R PITCH
Input SND file: <Type sound file to analyse>
Output P file: <Make up a name. May be same as sound file>
Output TXT file: <Make up a name. Also may be same>
Debug mode: <Type C.R. - Otherwise will do display>
Minimum Frequency: <Type min frequency in Hertz>
Maximum Frequency: <Type maximum frequency in Hertz>
MS between Computations: <5 is a good number>
Correlation threshold: <Type C.R.>


After it terminates, you will have a P file which is in MRG format, and  a TXT
file that is in SEG format.  There will be no pitches reported outside  of the
range you gave for minimum and maximum frequencies, so you better be sure they
are correct. The time between computations is not terribly critical.  The less
the better.  As low  as 1  or 2 Ms  takes quite  a bit  of computer  time, but
provides about the smoothest  tracing of pitch.  The correlation  threshold is
the minimum correlation  the signal may have.  If it has any  less correlation
than this, it is assumed to  be totally inharmonic, and the pitch  reported is
arbitrary. What it does is just linearly interpolate between the last and next
valid pitches.
ANALYSIS PROGRAMS                     21                                 PITCH


Inside the MRG file, there will be now three functions with the  extensions P,
C, and E, for pitch, correlation, and energy.
ANALYSIS PROGRAMS                     22                                   PDF


                      PDF - DO VOICED/UNVOICED DECISION


This program takes a P file  produced by PITCH and decides which parts  of the
original file were voiced (had a definite pitch) and which parts were unvoiced
(had no pitch center). It also decides which parts are silence and which parts
are not  silence.  This is  all the lead-in  to a linear  prediction synthesis
program, LPS, that is described next.

To use the program:

R PDF
Input P file: <type name of file produced by PITCH>
Function name in P file: <probably same as file name>
Correlation Threshhold: <Type C.R. to be safe>
Energy Threshold: <Also C.R.>


The decision criterion  used is simple thresholding  on the energy and  on the
correlation  coefficient.  If  the  energy  is  above  a  certain  amount, the
utterance is declared not to be  silence. If the energy is high enough  and if
the  correlation  is high  enough,  it  is declared  to  be  voiced, otherwise
unvoiced. The first run, these thresholds should be left as is by  typing C.R.
If you think the results can be improved by changing the thresholds, feel free
to experiment here.  A higher correlation threshold  means that there  will be
less  voiced  signal. A  higher  energy  threshold means  there  will  be more
silence.

This writes two more functions onto the P file with extensions PH and NH. What
these are are numbers  between 0 and 1  that specify the strength  (height) of
the  voiced excitation  and  the height  of  the noise  excitation.  These two
excitations are then added together in LPS.
PROCESSING PROGRAMS                   23                          INTRODUCTION


                     INTRODUCTION TO PROCESSING PROGRAMS
                          Or: How to munge your bits


The  processing  programs   include  routines  for  doing   linear  prediction
synthesis, energy normalization, sound file mixing, filtering, and  many more.
We have already  discussed S, which has  some processing functions as  well as
analysis functions.

LPS does linear prediction synthesis from data produced by FLTCMP,  PITCH, and
PDF (or any editing of these  data).  ENORM makes the energy of  the resulting
synthetic sound correspond to some  degree. FLTAPP is useful for  doing cross-
synthesis (like Tracy Petersen's work), producing the error function,  or just
whitening a signal. MIXSND is the sound file mixer and is probably one  of the
most  important  sound  processing  programs.  It  is  capable  of overlapping
multiple  copies of  various sound  files all  simultaneously. It  can extract
pieces of sound files for overlay.
PROCESSING PROGRAMS                   24                                   LPS


                      LPS - LINEAR PREDICTION SYNTHESIS


This  program takes  a MRG  file  produced by  PITCH and  PDF (a  P  file) and
synthesizes a  sound file  from it.   It takes  the pitch  contour from  the P
function in the P  file, the pulse train  amplitude from the PH  function, and
the noise  (Gaussian) amplitude  from the  NH function.  In addition,  you may
specify  a  frequency  multiplication   factor  which  just  scales   all  the
frequencies  involved.   If  the  filter order  is  too  high,  this sometimes
produces an objectionable buzzing due  to the "beating" of the  original pitch
and the altered pitch.

It takes a FUNC file to  specify time and frequency warping. If these  are not
specified, unity is taken for both. For example, if the time warping is set to
2.0 throughout, then the resulting sound will be exactly twice as long  as the
original.

To call:

R LPS
Input K file: <type name of K file from FLTCMP>
Function name for K parameters: <usually the same as the K file name>
Maximum number of coefficients: <C.R. gets all of them>
Input P file: <type name of P file from PITCH and PDF>
Function name for P parameters: <probably same as file name>
Output SND file: <make up a file name>
Warp function file (CR to finish): <type FUNC file name or CR>
<prints out names of functions in file, returns to above question>
Time warp function: <type name of function or C.R. for no time warp>
Frequency warp function: <type name of function or C.R. for no freq warp>
All voiced? <Y suppresses all frication>
All unvoiced? <If answer above was CR, Y makes whispered speech>
There exists a gain function. Use it? <Usually no>
Just use gain term? <Usually Yes>
PROCESSING PROGRAMS                   25                                   LPS


This business at  the end determines how  the energy is normallized.  There is
period-by-period   exact    normalization,   and   there    is   dead-reconing
normalization. The exact normalization is the most precise, but doesn't always
work because the filter doesn't always exactly model the speech.  Probably the
smoothest and safest  is to not  use the gain function  but just use  the gain
term. This  causes the criterion  to be the  actuall prediction  error (rather
than the  computed error  term you  get from  the autocorrelation  method) and
causes the dead-rekoning method to be used. This is both fast and  smooth, but
can sometimes produce unnatural output strengths.

Some  tips about  linear prediction  in general.  First, the  sounds  are much
better  if you  put some  reverberation on  them. That  takes  the knife-edged
tonality out. Next, the algorithm  is quirky. Sometimes it doesn't  work right
and sometimes it does, and there is little way to tell when. You just  have to
experiment  with it.  Sometimes the  same passage  just said  differently will
produce better results. If you can't get good results for some speech, skip it
and try to find a better one. You can improve the voiced-unvoiced decisions (a
necessary step) by editing the functions in the .P file. Remember,  the speech
sounds much better in context.

For the order of the filter, you can use the rule of thumb of  45 coefficients
for deep male voices and 25 for sopranos and in between for in between.
PROCESSING PROGRAMS                   26                                 ENORM


                         ENORM - ENERGY NORMALIZATION


This program  normalizes the energy  of a sound  file processed by  FLTAPP. It
does this by comparing the energy to the original energy as recorded in  the K
file as function RMS. You run it as follows:

R ENORM
Input FLT file: <sound file name to be normalized>
Input K file: <K file from FLTCMP>
Function name for K parameters: <probably same as file name>
Output SND file: <file name for output 12-bit file>
Time expansion ratio: <C.R. or number, like 1 or 1.5>

If you used a time expansion ratio in FLTAPP, then you ought to use the
same number here. It is not necessary for LPS because that already
does energy normalization. This program can also be used to just put the
envelope of one sound (from the K file) on another. Just run off a K
file at some very small order, like 2 or 4, for the sound whose envelope
you want to track, then you can apply that to any other sound file.
PROCESSING PROGRAMS                   27                                FLTAPP


                   FLTAPP - APPLY LINEAR PREDICTION FILTER


This program applies a filter (as computed by FLTCMP) to a sound  file, either
in  the  inverse  form or  the  forward  form. The  inverse  form  will remove
resonances ("whiten" the file).  The forward form will impose  resonances. For
instance,  to  do   Petersen-style  cross-synthesis,  you  would   produce  an
excitation source by taking some sound file, computing the filter for  it with
FLTCMP,  then  filtering  it  with  the  inverse  filter  using  FLTAPP.  This
excitation source  can then  be filtered  by the  forward filter  from another
sound file. In this manner, we can impose the spectral shape of,  say, speech,
on the pitch and articulation contour of some other sound file.

R FLTAPP
Input SND file: <type file name>
Input K file: <type K file name>
Output SND file: <type file name>
Number of coefficients: <C.R. or number>
Do you want the inverse filter? <Y or N, depending>
Starting time in sound file: <C.R. or time in seconds>
Time expansion ratio: <C.R. or factor, like 1.5 or 2>


This program produces  a floating point  file, so you  have to run  it through
either S or ENORM to convert it to 12-bit fixed point.

The only way to really understand the effect this program has is to run  it on
a few things and listen to the results.

The starting time in  the sound file is  when, for instance, you  are applying
speech formants  to a sound  file, but  you aren't applying  it to  the entire
sound file,  but only  to some  small portion.  In this  case, you  select the
beginning  time of  the portion  with this  starting time  and you  select the
duration  with the  time expansion  ratio. This  will expand  (or  shrink) the
duration of the speech data. >1 expands, <1 shrinks.
PROCESSING PROGRAMS                   28                                FLTAPP


To do real cross-synthesis, usually the process is to take an  instrument tone
and whiten it by filtering it with  its own inverse filter of low order  (4 or
6). Take this whitened sound and  then filter with the forward filter  for the
speech sound that you are imposing (usually of high order like 35 or  45). The
whitening improves the  intelligibility of the  speech immensely. You  can get
all degrees between the instrument  and the voice by increasing  the whitening
and increasing the  order of the vocal  filter. Things that are  already white
(like  cymbol  crashes  or  ocean sounds)  probably  don't  need  much inverse
filtering. This all works much better if the instrumental and voice sounds are
exactly matched,  like done at  the same  time. For instance,  you can  take a
speech sound, then using our marvelous simultaneous play and record feature in
ADUDP, record a sax  line that matches the  speech. You might want  to produce
the speech at different  durations also, like 2X or  4X. You can do  this with
LPS, which will not give ultimate quality, but will give you some  speech that
you can sync the sax with. This matched instrument and voice seems to give the
best results.
PROCESSING PROGRAMS                   29                                SRCONV


                        SRCONV - CHANGE SAMPLING RATES


This program will change  the sampling rate of  a file by any  integral ratio,
like 2/1, 3/2,  201/200, and the  like.  This is  like speeding up  or slowing
down the tape. The duration will  be changed along with the pitch.  The output
file is in 36-bit floating point and must be converted to integer by  either S
or ENORM or something.

The  program  asks  you for  two  numbers.  These are  the  numerator  and the
denominator of  a ratio.  The denominator refers  to the  output file  and the
numerator refers  to the  input file. So,  for instance,  to take  every other
sample,  or  effectively  double  the  sampling  rate,  you  should  type  1/2
(numerator of 1, denominator of 2). This will give you a file half as long.

R SRCOMV
Input SND file: <input file here>
Output SND file: <output file name here>
Numerator: <Corresponds to input file>
Denominator: <Corresponds to output file>


Note that by use of any pitch changing method (such as DFSYN or LPS)  and this
routine, you can effectively change the duration of the file  without changing
the pitch. For instance, if you change  the pitch by a factor of 2 (put  it up
an octave),  then change the  sampling rate  by a factor  of 2/1  (doubles the
number of samples), then you get the original pitch back, but the  duration is
now doubled. This isn't  really the best way  to do this. The  duration change
should actually be worked into the synthesis routines.
PROCESSING PROGRAMS                   30                                HEADER


          HEADER - PUT HEADER ON SOUND FILE, OR EDIT EXISTING HEADER


This program allows you to put  a header on a sound file that  doesn't already
have one. It also allows you to  alter the header if one aready exists.  It is
pretty self-explainatory.
DISPLAY PROGRAMS                      31                          INTRODUCTION


                               DISPLAY PROGRAMS


This section is sort of a  catch-all for programs that don't do  any analysis.
These are mostly just programs that  examine MRG files, since MRG files  are a
bit opaque in general.

There is EXMRG, which can examing any function in any MRG file. It  is capable
of looking at only a portion of a function in a MRG file also.

AFPIX assumes  that the  MRG file  has been  produced by  DFSYN and  pairs the
amplitude and  frequency functions  into one picture.  This one  runs straight
through and produces  pictures (and plot files)  of all the  functions without
you saying a thing.
DISPLAY PROGRAMS                      32                                 EXMRG


                         EXMRG - EXAMINE MERGE FILES


This program displays functions in MRG files.

R EXMRG
MRG file name: <type in file name here>
<it may type out the names of the functions now>
Function name: <type desired function name>
Beginning time: <C.R. for beginning of function>
Ending time: <C.R. for all the way to the end>
<it displays the function now>
<type C.R. to display another function>


Each time it does a display,  it writes out another plot file. It  starts with
the file name MRG1.PLT and increments the number each plot. Thus, using  the S
program ($D command), you can look at the plot files.
DISPLAY PROGRAMS                      33                                 AFPIX


              AFPIX - DISPLAY MAGNITUDE AND FREQUENCY FUNCTIONS


For MF files that were produced by DFSYN, this program displays some number of
channels worth,  putting a  single channel, amplitude  and frequency,  on each
display. This writes out a series of PLT files also.

R AFPIX
Input MF file: <Type name of file>
Function name: <Probably same as file name>
Plot only? <If you don't want display, just plot files, type Y>
Starting channel: <1 is a good number>
Ending channel: <as high as you want to see>
Count by: <probably want 1 here>


It will type out each file name as it writes it out on the disk.
EDITING PROGRAMS                      34                          INTRODUCTION


                               EDITING PROGRAMS


We have  already mentioned S,  which is capable  of doing some  minor editing.
Here we will  mention REVED, the reverberation  editor, EDSND, the  sound file
editor, and  FUNED, the function  editor. EDSND  allows a user  to break  up a
sound file into segments either  by thresholding the energy of the  signal, or
by visual inspection of the file, to hear selected segments of the file in any
order, and to copy the selected segments in any order to an output file. REVED
allows manipulation  of reverberation  parameters and  display of  the impulse
response of the resulting reverberator.  This seems to be the  only convenient
way to design a reverberator.  FUNED allows editing of SEG-type  functions, as
well as conversion between MRG format and SEG format.
EDITING PROGRAMS                      35                                 FUNED


                   FUNED - PIECEWISE-LINEAR FUNCTION EDITOR


When the program is first entered, it prints out a list of the  functions.  It
then asks "Command or  Function to modify". You  then type (in full)  either a
command or a function name. If  you type a function name, the  function editor
(EDFUN) will be called  with that function as  an argument.  If there  isn't a
function of the name that you typed, the name will be examined for  whether it
is a recognized command. The following commands are interpreted:

EXIT - Leave FUNED. <ALT> also works.

PRINT - Prints out the names of the functions.

INPUT - Input a SEG-type function file.  Asks for a file name and  appends the
new function list onto the old  one. You then have a longer function  list. If
any of the new functions have the same name as the old ones, you have a little
bit of a problem, of course. Any search for that function name will  just grab
the first one in the list. You can, however, rename them.

WRITE -  Asks for a  file name and  writes out the  entire function list  on a
file.

DELETE - Asks for a function name and deletes it from the list.

RENAME - Asks for an old function and a new function name and renames it.

MERGE - This interfaces with the merge-file routines. It asks you for  a merge
file name. If there is a merge  file by that name out there, it prints  a list
of the names of all the  functions therein. It then asks you for  a merge-file
function name. If you  type <CR>, you will exit  this command and get  back to
the "Command or  Function to modify" prompt.  If the merge-file  function name
you typed can be found,  it then asks you "Straight or  Piecewise-Linear?". If
you respond "S" to  this, it just takes the  merge file function and  makes it
EDITING PROGRAMS                      36                                 FUNED


into a SEG function without any loss  of data. That is, there will be  as many
breakpoints in the functions as there are points in the original function.  It
will ask you for a new name for the SEG function.  If you would like some data
reduction,  you should  respond "P".  You will  then get  into  another little
command loop that can accept  four different commands: <CR> adds  another line
segment to the approximation, "↑" subtracts a segment from  the approximation,
"Y"  stops  here  and  makes  up  a  record  corresponding  to   the  selected
approximation, and <ALT> aborts the process and forgets about it.  If  you get
through "Y", it asks you for a new SEG function name and calls the  new record
that. It then  goes back to asking  you for another merge-file  function name.
<CR> at that point gets you back to the main command loop.

There are plans in the works  for this routine to do function  plotting, other
forms  of display,  and  limited sound-file  analysis too,  and  anything else
anyone feels would be useful.

If you just type the name of a function, rather than any of the  above special
commands,  you  get into  EDFUN,  then  function editor.   This  is  a minimal
function editor. It allows  hand-modification of SEG-type functions.  When the
routine if first entered, it displays three basic things: the  function itself
with horizontal and vertical axes, a  list of the breakpoints (or at  least as
many as it  can get on the  screen), and a "cursor",  which looks for  all the
world  like  a great  big  sharp  sign. This  cursor  surrounds  the "current"
breakpoint, which is initialized to be the first breakpoint.

This program accepts simple commands that consist of a repeat argument  and an
activation character.  If the repeat  argument is missing,  it is taken  to be
one.  If included,  it specifies  the number  of times  the command  is  to be
repeated. The special symbols "∞"  or "*" stand for the number  of breakpoints
in the function. The activation character is a single character with <CONTROL>
(hereafter abbreviated as "α"),  or <META> (hereafter abbreviated as  "β") on.
Some  commands accept  a binary  scale  on the  repeat argument  coded  in the
control, meta, and sometimes top bits.  This will be denoted as "λ" below.
EDITING PROGRAMS                      37                                 FUNED


(Note: at IRCAM right now, the effect of α and β (control and meta  keys) must
be simulated with ALTs. One ALT means α, two alts mean β, and three  alts mean
αβ. I  haven't figured  out yet how  to get  a single ALT  in as  a character,
though. Hmmm.)

First, to get out  of the editor, you type  αE, which updates your  record and
exits. If you want to get out without updating the record, type <ALT>.  If you
want  to  update   the  record  without   exiting,  type  α.   (that's  right,
<CONTROL><PERIOD>).  To reset the display from whatever is in the record, type
αβ<BS>  or αβO  (sort of  like αXCANCEL  in E).   If you  are doing  any hairy
complicated change, you will probably want to α. a lot.

To make any modification, you move  to the breakpoint you want to  change, and
you change it. Breakpoint  moving commands are "←"  and "→" for move  left and
right. Breakpoint changing commands are many. Each time you move a breakpoint,
the values at  that point are saved  away, and you can  get them back  with αR
(restore). This allows you to make any stupid move and as long as  you haven't
moved  to a  different breakpoint,  you can  reset the  value to  what  it was
before.

There is αD which deletes the current breakpoint.  A repeat argument  means to
delete that many breakpoints.  If you ask to delete  more than 4, it  will ask
you to confirm the massive delete.

Also, εI invents a new breakpoint and puts it on top of the old one.  A repeat
argument creats that  many new breakpoints. You  can then move them  around to
wherever you want.

εB breaks  the line segment  after the current  breakpoint into the  number of
pieces specified by the repeat argument.  It will look for all the  world that
nothing has happened, because what it  does is to slide down the  line segment
and insert that many new breakpoints along the line. You can then move through
and change them to wherever you please.
EDITING PROGRAMS                      38                                 FUNED


εC is to type in a new value for the breakpoint. A repeat argument says  to do
it to that many consecutive breakpoints.  It loads the current value of  the X
and then Y  coordinates of the breakpoints  into your line editor.  You should
type <CR> if you don't want to change them, or edit them however you wish. The
only problem here is that you have to be sure and wait for the line  editor to
get loaded (Hic!) before typing <CR>, else the program will get very confused.
At IRCAM, it doesn't (yet) load your line editor.

To move breakpoints, you use the commands λ(, λ), λ/ or λ\. To figure out what
this is about, put your four fingers (thumb excluded) over the keys ()/ and \.
They then are left,  right, up, and down,  with the strength specified  by the
control, meta, and top keys, presumably operated with the left hand.  A repeat
argument specifies more strength in the motion. If the minimum strength is too
much, you can change it with the α* and α⊗ commands. These cause  the strength
to be halved or doubled  respectively.  The strength is initially set  to 8.0,
and is in terms of raster units on the screen. α( will move the  breakpoint to
the left by eight raster units  (unless you change the scale from  its default
of  8). αβ[  will move  the  breakpoint to  the left  by 512  raster  units (I
think!).

αN is used to normalize a  function to be like a classical FUNC  SEG function.
That  is, it  normalizes the  horizontal axis  to go  from 1  to 100,  and the
vertical axis to go from 0 to 1. If you are using this editor to set  up stuff
for the music program, you probably ought to normalize the function  sooner or
later.

That's about it.  The key commands  are εE to  get out, α→  and α← to  move to
different breakpoints, the ()/\ commands for changing breakpoints, as  well as
αC for reading in  new coordinates, the αI and  αB commands for making  up new
breakpoints, and αD for getting rid of breakpoints. What could be simpler?